Setup

library(here)
here() starts at /Users/maarten/Documents/projects/gamified-feedback-study
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
library(scales)
library(patchwork)
Warning: package 'patchwork' was built under R version 4.3.3
library(stringr)
library(tidyr)
library(lme4)
Loading required package: Matrix

Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':

    expand, pack, unpack
library(lmerTest)

Attaching package: 'lmerTest'
The following object is masked from 'package:lme4':

    lmer
The following object is masked from 'package:stats':

    step
library(brms)
Loading required package: Rcpp
Loading 'brms' package (version 2.21.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').

Attaching package: 'brms'
The following object is masked from 'package:lme4':

    ngrps
The following object is masked from 'package:stats':

    ar
# Set up parallel processing for Bayesian models
library(future)
plan(multisession, workers = 4)

Helper functions for plots and tables:

source(here("scripts", "00_visualisation_functions.R"))
Loading required package: data.table

Attaching package: 'data.table'
The following objects are masked from 'package:dplyr':

    between, first, last
Loading required package: flextable

Load processed data:

d_test <- readRDS(here("data", "processed", "d_test.rds"))
add_experiment_cols <- function (data) {
  data |>
    mutate(exp_order = case_when(
      gamified_first == 0 & exp_group == "score" ~ "Control—Score",
      gamified_first == 0 & exp_group == "both" ~ "Control—Both",
      gamified_first == 1 & exp_group == "score" ~ "Score—Control",
      gamified_first == 1 & exp_group == "both" ~ "Both—Control"
    )) |>
    mutate(type = ifelse(gamified, "Gamified", "Control"))
}

Helper function for interpreting Bayes factors:

bf_to_strength <- function (bf) {
  
  direction <- "for"
  
  if (bf < 1) {
    bf <- 1/bf
    direction <- "against"
  }
  
  strength <- case_when(
    bf == 1 ~ "No",
    bf < 3 ~ "Anecdotal",
    bf < 10 ~ "Moderate",
    bf < 30 ~ "Strong",
    bf < 100 ~ "Very strong",
    TRUE ~ "Extreme"
  )
  
  paste0(strength, " evidence ", direction)
}

Does gamification change learning outcomes on the test?

Accuracy

Prepare data

d_test_acc <- d_test |>
  group_by(subject, exp_group, block, condition, gamified, gamified_first) |>
  summarise(accuracy = mean(correct))
`summarise()` has grouped output by 'subject', 'exp_group', 'block',
'condition', 'gamified'. You can override using the `.groups` argument.
d_test_acc_agg <- d_test_acc |>
  group_by(block, condition, gamified, gamified_first, exp_group) |>
  summarise(acc = mean(accuracy, na.rm = T),
            acc_se = sd(accuracy, na.rm = T)/sqrt(n())) |>
  ungroup() |>
  add_experiment_cols()
`summarise()` has grouped output by 'block', 'condition', 'gamified',
'gamified_first'. You can override using the `.groups` argument.

Visualise data

p_test_acc <- plot_data(d_test_acc_agg, acc, acc_se, "Accuracy") +
  scale_y_continuous(limits = c(.35, .6), labels = scales::percent_format())

p_test_acc

Fit model

Prepare data for modelling by mean-centering categorical predictors:

d_test_m <- d_test |>
  mutate(exp_group_c = ifelse(exp_group == "score", 0, 1),
         exp_group_c = exp_group_c - mean(exp_group_c),
         gamified_first_c = gamified_first - mean(gamified_first))
m_test_acc <- glmer(correct ~ gamified +
                       gamified:exp_group_c +
                       gamified:gamified_first_c +
                       gamified:gamified_first_c:exp_group_c +
                       (1 | subject) + (1 | fact),
                     family = "binomial",
                     data = d_test_m)

summary(m_test_acc)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: 
correct ~ gamified + gamified:exp_group_c + gamified:gamified_first_c +  
    gamified:gamified_first_c:exp_group_c + (1 | subject) + (1 |      fact)
   Data: d_test_m

     AIC      BIC   logLik deviance df.resid 
  8909.2   8977.8  -4444.6   8889.2     7014 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8290 -0.8456  0.3436  0.8197  3.1357 

Random effects:
 Groups  Name        Variance Std.Dev.
 subject (Intercept) 0.6635   0.8145  
 fact    (Intercept) 0.3379   0.5813  
Number of obs: 7024, groups:  subject, 152; fact, 70

Fixed effects:
                                            Estimate Std. Error z value
(Intercept)                                -0.015175   0.103341  -0.147
gamifiedTRUE                                0.008417   0.052769   0.160
gamifiedFALSE:exp_group_c                  -0.153904   0.153331  -1.004
gamifiedTRUE:exp_group_c                   -0.178017   0.153562  -1.159
gamifiedFALSE:gamified_first_c              0.170561   0.153567   1.111
gamifiedTRUE:gamified_first_c              -0.208889   0.153591  -1.360
gamifiedFALSE:exp_group_c:gamified_first_c -0.166474   0.308267  -0.540
gamifiedTRUE:exp_group_c:gamified_first_c  -0.206697   0.308355  -0.670
                                           Pr(>|z|)
(Intercept)                                   0.883
gamifiedTRUE                                  0.873
gamifiedFALSE:exp_group_c                     0.316
gamifiedTRUE:exp_group_c                      0.246
gamifiedFALSE:gamified_first_c                0.267
gamifiedTRUE:gamified_first_c                 0.174
gamifiedFALSE:exp_group_c:gamified_first_c    0.589
gamifiedTRUE:exp_group_c:gamified_first_c     0.503

Correlation of Fixed Effects:
              (Intr) gmTRUE gmfdFALSE:x__ gmfdTRUE:x__ gmfdFALSE:g__
gamifidTRUE   -0.255                                                
gmfdFALSE:x__ -0.019  0.004                                         
gmfdTRUE:x__  -0.017  0.005  0.760                                  
gmfdFALSE:g__ -0.019  0.007 -0.009        -0.007                    
gmfdTRUE:g__  -0.015  0.012 -0.007        -0.004        0.763       
gFALSE:__:_   -0.007  0.001 -0.028        -0.022       -0.025       
gTRUE:__:__   -0.006  0.006 -0.022        -0.013       -0.023       
              gmfdTRUE:g__ gFALSE:__:
gamifidTRUE                          
gmfdFALSE:x__                        
gmfdTRUE:x__                         
gmfdFALSE:g__                        
gmfdTRUE:g__                         
gFALSE:__:_   -0.023                 
gTRUE:__:__   -0.019        0.761    
print_model_table(m_test_acc)

Effect

Estimate

SE

z-value

p-value

Intercept (Control)

-0.02

0.10

-0.15

.883

Gamification

0.01

0.05

0.16

.873

Type of gamification (Points-and-progress - Points)

-0.18

0.15

-1.16

.246

Type of gamification on Control (Points-and-progress - Points)

-0.15

0.15

-1.00

.316

Block on gamification (1 - 2)

-0.21

0.15

-1.36

.174

Block on Control (2 - 1)

0.17

0.15

1.11

.267

Type of gamification; block difference (Block 1 - Block 2)

-0.21

0.31

-0.67

.503

Type of gamification on Control; block difference (Block 2 - Block 1)

-0.17

0.31

-0.54

.589

Visualise fitted model

p_test_acc_m <- plot_model_fit(m_test_acc, d_test_m, y_lab = "Accuracy") +
  scale_y_continuous(limits = c(.35, .6), labels = scales::percent_format(accuracy = .1))
  block           condition gamified gamified_first exp_group gamified_first_c
1     1             Control    FALSE          FALSE      both       -0.5380125
2     1             Control    FALSE          FALSE     score       -0.5380125
3     1              Points     TRUE           TRUE     score        0.4619875
4     1 Points-and-progress     TRUE           TRUE      both        0.4619875
5     2             Control    FALSE           TRUE      both        0.4619875
6     2             Control    FALSE           TRUE     score        0.4619875
7     2              Points     TRUE          FALSE     score       -0.5380125
8     2 Points-and-progress     TRUE          FALSE      both       -0.5380125
  exp_group_c  pred_val     exp_order     type
1   0.5545273 0.4644061  Control—Both  Control
2  -0.4454727 0.4804404 Control—Score  Control
3  -0.4454727 0.5046444 Score—Control Gamified
4   0.5545273 0.4366104  Both—Control Gamified
5   0.5545273 0.4839130  Both—Control  Control
6  -0.4454727 0.5415149 Score—Control  Control
7  -0.4454727 0.5337956 Control—Score Gamified
8   0.5545273 0.5171377  Control—Both Gamified
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
p_test_acc_m

Bayesian regression model

Fit again with brms so that we can calculate Bayes Factors. Because we expect any fixed effects to be at most moderate in size, we will use a weakly informative Normal(0, 1) prior for these effects.

m_test_acc_brms <- brm(correct ~ gamified +
                         gamified:exp_group_c +
                         gamified:gamified_first_c +
                         gamified:gamified_first_c:exp_group_c +
                         (1 | subject) + (1 | fact),
                       family = "bernoulli",
                       data = d_test_m,
                       prior = set_prior("normal(0, 1)", class = "b"),
                       chains = 4,
                       iter = 11000,
                       warmup = 1000,
                       sample_prior = TRUE,
                       future = TRUE,
                       seed = 0)
Compiling Stan program...
Trying to compile a simple C file
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated.
make[1]: *** [foo.o] Error 1
Start sampling
Loading required package: StanHeaders

rstan version 2.32.6 (Stan version 2.32.2)
For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)
For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
change `threads_per_chain` option:
rstan_options(threads_per_chain = 1)

Attaching package: 'rstan'
The following object is masked from 'package:tidyr':

    extract

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000451 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 4.51 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:     1 / 11000 [  0%]  (Warmup)
Chain 1: Iteration:  1001 / 11000 [  9%]  (Sampling)
Chain 1: Iteration:  2100 / 11000 [ 19%]  (Sampling)
Chain 1: Iteration:  3200 / 11000 [ 29%]  (Sampling)
Chain 1: Iteration:  4300 / 11000 [ 39%]  (Sampling)
Chain 1: Iteration:  5400 / 11000 [ 49%]  (Sampling)
Chain 1: Iteration:  6500 / 11000 [ 59%]  (Sampling)
Chain 1: Iteration:  7600 / 11000 [ 69%]  (Sampling)
Chain 1: Iteration:  8700 / 11000 [ 79%]  (Sampling)
Chain 1: Iteration:  9800 / 11000 [ 89%]  (Sampling)
Chain 1: Iteration: 10900 / 11000 [ 99%]  (Sampling)
Chain 1: Iteration: 11000 / 11000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 6.898 seconds (Warm-up)
Chain 1:                47.705 seconds (Sampling)
Chain 1:                54.603 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000921 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 9.21 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:     1 / 11000 [  0%]  (Warmup)
Chain 2: Iteration:  1001 / 11000 [  9%]  (Sampling)
Chain 2: Iteration:  2100 / 11000 [ 19%]  (Sampling)
Chain 2: Iteration:  3200 / 11000 [ 29%]  (Sampling)
Chain 2: Iteration:  4300 / 11000 [ 39%]  (Sampling)
Chain 2: Iteration:  5400 / 11000 [ 49%]  (Sampling)
Chain 2: Iteration:  6500 / 11000 [ 59%]  (Sampling)
Chain 2: Iteration:  7600 / 11000 [ 69%]  (Sampling)
Chain 2: Iteration:  8700 / 11000 [ 79%]  (Sampling)
Chain 2: Iteration:  9800 / 11000 [ 89%]  (Sampling)
Chain 2: Iteration: 10900 / 11000 [ 99%]  (Sampling)
Chain 2: Iteration: 11000 / 11000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 7.442 seconds (Warm-up)
Chain 2:                46.282 seconds (Sampling)
Chain 2:                53.724 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.000796 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 7.96 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:     1 / 11000 [  0%]  (Warmup)
Chain 3: Iteration:  1001 / 11000 [  9%]  (Sampling)
Chain 3: Iteration:  2100 / 11000 [ 19%]  (Sampling)
Chain 3: Iteration:  3200 / 11000 [ 29%]  (Sampling)
Chain 3: Iteration:  4300 / 11000 [ 39%]  (Sampling)
Chain 3: Iteration:  5400 / 11000 [ 49%]  (Sampling)
Chain 3: Iteration:  6500 / 11000 [ 59%]  (Sampling)
Chain 3: Iteration:  7600 / 11000 [ 69%]  (Sampling)
Chain 3: Iteration:  8700 / 11000 [ 79%]  (Sampling)
Chain 3: Iteration:  9800 / 11000 [ 89%]  (Sampling)
Chain 3: Iteration: 10900 / 11000 [ 99%]  (Sampling)
Chain 3: Iteration: 11000 / 11000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 8.04 seconds (Warm-up)
Chain 3:                45.446 seconds (Sampling)
Chain 3:                53.486 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.007346 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 73.46 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:     1 / 11000 [  0%]  (Warmup)
Chain 4: Iteration:  1001 / 11000 [  9%]  (Sampling)
Chain 4: Iteration:  2100 / 11000 [ 19%]  (Sampling)
Chain 4: Iteration:  3200 / 11000 [ 29%]  (Sampling)
Chain 4: Iteration:  4300 / 11000 [ 39%]  (Sampling)
Chain 4: Iteration:  5400 / 11000 [ 49%]  (Sampling)
Chain 4: Iteration:  6500 / 11000 [ 59%]  (Sampling)
Chain 4: Iteration:  7600 / 11000 [ 69%]  (Sampling)
Chain 4: Iteration:  8700 / 11000 [ 79%]  (Sampling)
Chain 4: Iteration:  9800 / 11000 [ 89%]  (Sampling)
Chain 4: Iteration: 10900 / 11000 [ 99%]  (Sampling)
Chain 4: Iteration: 11000 / 11000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 7.152 seconds (Warm-up)
Chain 4:                44.654 seconds (Sampling)
Chain 4:                51.806 seconds (Total)
Chain 4: 
summary(m_test_acc_brms)
 Family: bernoulli 
  Links: mu = logit 
Formula: correct ~ gamified + gamified:exp_group_c + gamified:gamified_first_c + gamified:gamified_first_c:exp_group_c + (1 | subject) + (1 | fact) 
   Data: d_test_m (Number of observations: 7024) 
  Draws: 4 chains, each with iter = 11000; warmup = 1000; thin = 1;
         total post-warmup draws = 40000

Multilevel Hyperparameters:
~fact (Number of levels: 70) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.60      0.06     0.49     0.72 1.00     9887    18255

~subject (Number of levels: 152) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.84      0.06     0.72     0.97 1.00    10168    18058

Regression Coefficients:
                                           Estimate Est.Error l-95% CI u-95% CI
Intercept                                     -0.02      0.11    -0.22     0.19
gamifiedTRUE                                   0.01      0.05    -0.10     0.11
gamifiedFALSE:exp_group_c                     -0.15      0.15    -0.45     0.15
gamifiedTRUE:exp_group_c                      -0.17      0.15    -0.48     0.13
gamifiedFALSE:gamified_first_c                 0.17      0.15    -0.13     0.47
gamifiedTRUE:gamified_first_c                 -0.21      0.15    -0.51     0.10
gamifiedFALSE:exp_group_c:gamified_first_c    -0.14      0.29    -0.72     0.44
gamifiedTRUE:exp_group_c:gamified_first_c     -0.18      0.29    -0.76     0.40
                                           Rhat Bulk_ESS Tail_ESS
Intercept                                  1.00     6025    11946
gamifiedTRUE                               1.00    56931    30687
gamifiedFALSE:exp_group_c                  1.00     6665    13012
gamifiedTRUE:exp_group_c                   1.00     6751    13442
gamifiedFALSE:gamified_first_c             1.00     6871    14131
gamifiedTRUE:gamified_first_c              1.00     6895    14193
gamifiedFALSE:exp_group_c:gamified_first_c 1.00     7791    15057
gamifiedTRUE:exp_group_c:gamified_first_c  1.00     7762    14611

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Inspect the posterior sample distributions of the fixed effects:

plot(m_test_acc_brms, nvariables = 8, variable = "^b_", regex = TRUE)

Bayes factors

Do a hypothesis test for all fixed-effect coefficients (both main effects and interactions) in the model being equal to zero. The column Evid.Ratio shows the Bayes Factor in favour of the null hypothesis (\(BF_{01}\)).

h_test_acc <- hypothesis(m_test_acc_brms,
                         c("gamifiedTRUE = 0",
                           "gamifiedFALSE:exp_group_c = 0",
                           "gamifiedTRUE:exp_group_c = 0",
                           "gamifiedFALSE:gamified_first_c = 0",
                           "gamifiedTRUE:gamified_first_c = 0",
                           "gamifiedFALSE:exp_group_c:gamified_first_c = 0",
                           "gamifiedTRUE:exp_group_c:gamified_first_c = 0"),
                         class = "b")

h_test_acc$hypothesis |>
  mutate(BF10 = 1 / Evid.Ratio,
         evidence_for_null = sapply(Evid.Ratio, bf_to_strength))

This hypothesis test is calculating the Savage-Dickey density ratio at zero, which is a ratio of the posterior density at zero relative to the prior density at zero (indicated by dashed vertical line). Values above 1 indicate a stronger belief that the effect is indeed zero after having observed the data.

sd_ratio_acc <- plot(h_test_acc, nvariables = 8, variable = "^b_", regex = TRUE, plot = FALSE)

sd_ratio_acc[[1]] +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey25")

Conclusion

The Bayesian model finds anecdotal to strong evidence in favour of the null hypothesis (no effect on posttest accuracy) for each of the regression coefficients.

Response time

Response time on correct answers only.

Prepare data

To keep the visualisation of average response times by condition simple, we calculate the median RT per participant, and then take the mean and SD of these medians (which are themselves roughly normally distributed).

d_test_rt <- d_test |>
  filter(correct) |>
  mutate(rt = rt / 1000) |>
  group_by(subject, exp_group, block, condition, gamified, gamified_first) |>
  summarise(rt_median = median(rt, na.rm = TRUE))
`summarise()` has grouped output by 'subject', 'exp_group', 'block',
'condition', 'gamified'. You can override using the `.groups` argument.
d_test_rt_agg <- d_test_rt |>
  group_by(block, condition, gamified, gamified_first, exp_group) |>
  summarise(rt_mean = mean(rt_median, na.rm = T),
            rt_se = sd(rt_median, na.rm = T)/sqrt(n())) |>
  ungroup() |>
  add_experiment_cols()
`summarise()` has grouped output by 'block', 'condition', 'gamified',
'gamified_first'. You can override using the `.groups` argument.

Visualise data

p_test_rt <- plot_data(d_test_rt_agg, rt_mean, rt_se, "Response time (s)") +
  scale_y_continuous(limits = c(3, 6), labels = scales::comma_format())

p_test_rt

Fit model

Since RT data is not normally distributed, we fit a lognormal model to the response times. (See https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#gamma-glmms .) Prepare data for modelling by mean-centering categorical predictors:

d_test_rt_m <- d_test |>
  filter(correct) |>
  mutate(log_rt = log(rt / 1000)) |>
  mutate(exp_group_c = ifelse(exp_group == "score", 0, 1),
         exp_group_c = exp_group_c - mean(exp_group_c),
         gamified_first_c = gamified_first - mean(gamified_first))
m_test_rt <- lmer(log_rt ~ gamified +
                      gamified:exp_group_c +
                      gamified:gamified_first_c +
                      gamified:gamified_first_c:exp_group_c +
                      (1 | subject) + (1 | fact),
                    data = d_test_rt_m)

summary(m_test_rt)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: 
log_rt ~ gamified + gamified:exp_group_c + gamified:gamified_first_c +  
    gamified:gamified_first_c:exp_group_c + (1 | subject) + (1 |      fact)
   Data: d_test_rt_m

REML criterion at convergence: 5381.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.9689 -0.6786 -0.1696  0.5261  4.8408 

Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 0.07237  0.2690  
 fact     (Intercept) 0.01968  0.1403  
 Residual             0.22400  0.4733  
Number of obs: 3673, groups:  subject, 152; fact, 70

Fixed effects:
                                            Estimate Std. Error        df
(Intercept)                                1.463e+00  3.028e-02 2.065e+02
gamifiedTRUE                               7.858e-03  1.611e-02 3.515e+03
gamifiedFALSE:exp_group_c                  1.367e-02  5.045e-02 1.720e+02
gamifiedTRUE:exp_group_c                   4.380e-02  5.067e-02 1.736e+02
gamifiedFALSE:gamified_first_c             6.011e-02  5.053e-02 1.708e+02
gamifiedTRUE:gamified_first_c              1.174e-01  5.064e-02 1.711e+02
gamifiedFALSE:exp_group_c:gamified_first_c 2.316e-01  1.015e-01 1.723e+02
gamifiedTRUE:exp_group_c:gamified_first_c  1.980e-01  1.017e-01 1.727e+02
                                           t value Pr(>|t|)    
(Intercept)                                 48.315   <2e-16 ***
gamifiedTRUE                                 0.488   0.6257    
gamifiedFALSE:exp_group_c                    0.271   0.7867    
gamifiedTRUE:exp_group_c                     0.864   0.3885    
gamifiedFALSE:gamified_first_c               1.190   0.2358    
gamifiedTRUE:gamified_first_c                2.318   0.0216 *  
gamifiedFALSE:exp_group_c:gamified_first_c   2.282   0.0237 *  
gamifiedTRUE:exp_group_c:gamified_first_c    1.946   0.0533 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
              (Intr) gmTRUE gmfdFALSE:x__ gmfdTRUE:x__ gmfdFALSE:g__
gamifidTRUE   -0.262                                                
gmfdFALSE:x__ -0.040  0.000                                         
gmfdTRUE:x__  -0.040  0.006  0.792                                  
gmfdFALSE:g__ -0.028  0.018 -0.005        -0.016                    
gmfdTRUE:g__  -0.018  0.018 -0.016         0.002        0.796       
gFALSE:__:_   -0.004 -0.016 -0.036        -0.028       -0.048       
gTRUE:__:__   -0.014  0.029 -0.027        -0.008       -0.045       
              gmfdTRUE:g__ gFALSE:__:
gamifidTRUE                          
gmfdFALSE:x__                        
gmfdTRUE:x__                         
gmfdFALSE:g__                        
gmfdTRUE:g__                         
gFALSE:__:_   -0.046                 
gTRUE:__:__   -0.046        0.792    
print_model_table(m_test_rt)

Effect

Estimate

SE

df

t-value

p-value

Intercept (Control)

1.463

0.030

206.46

48.32

<.001

Gamification

0.008

0.016

3,514.91

0.49

.626

Type of gamification (Points-and-progress - Points)

0.044

0.051

173.57

0.86

.389

Type of gamification on Control (Points-and-progress - Points)

0.014

0.050

171.97

0.27

.787

Block on gamification (1 - 2)

0.117

0.051

171.14

2.32

.022

Block on Control (2 - 1)

0.060

0.051

170.84

1.19

.236

Type of gamification; block difference (Block 1 - Block 2)

0.198

0.102

172.68

1.95

.053

Type of gamification on Control; block difference (Block 2 - Block 1)

0.232

0.101

172.31

2.28

.024

Fitted values

d_model_fit <- crossing(
  gamified = TRUE, 
  exp_group_c = 0, 
  gamified_first_c = sort(unique(d_test_rt_m$gamified_first_c))
)

d_model_fit$model_fit <- predict(m_test_rt,
                                 newdata = d_model_fit,
                                 re.form = NA, 
                                 type = "response") |>
  exp() # Transform logRT to RT

d_model_fit
d_model_fit <- crossing(
  gamified = FALSE, 
  exp_group_c = sort(unique(d_test_rt_m$exp_group_c)), 
  gamified_first_c = sort(unique(d_test_rt_m$gamified_first_c))
)

d_model_fit$model_fit <- predict(m_test_rt,
                                 newdata = d_model_fit,
                                 re.form = NA, 
                                 type = "response") |>
  exp() # Transform logRT to RT

d_model_fit

Visualise fitted model

p_test_rt_m <- plot_model_fit(m_test_rt, d_test_rt_m, exp_trans = TRUE, y_lab = "Response time (s)") +
  scale_y_continuous(limits = c(3, 6), labels = scales::comma_format())
  block           condition gamified gamified_first exp_group gamified_first_c
1     1             Control    FALSE          FALSE      both       -0.5363463
2     1             Control    FALSE          FALSE     score       -0.5363463
3     1              Points     TRUE           TRUE     score        0.4636537
4     1 Points-and-progress     TRUE           TRUE      both        0.4636537
5     2             Control    FALSE           TRUE      both        0.4636537
6     2             Control    FALSE           TRUE     score        0.4636537
7     2              Points     TRUE          FALSE     score       -0.5363463
8     2 Points-and-progress     TRUE          FALSE      both       -0.5363463
  exp_group_c pred_val     exp_order     type
1   0.5722842 3.924932  Control—Both  Control
2  -0.4277158 4.383624 Control—Score  Control
3  -0.4277158 4.336746 Score—Control Gamified
4   0.5722842 4.966467  Both—Control Gamified
5   0.5722842 4.758713  Both—Control  Control
6  -0.4277158 4.216226 Score—Control  Control
7  -0.4277158 4.197234 Control—Score Gamified
8   0.5722842 3.943450  Control—Both Gamified
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
p_test_rt_m

Bayesian regression model

Fit again with brms so that we can calculate Bayes Factors. Because we expect any fixed effects to be at most moderate in size, we will use a weakly informative Normal(0, 1) prior for these effects.

m_test_rt_brms <- brm(log_rt ~ gamified +
                         gamified:exp_group_c +
                         gamified:gamified_first_c +
                         gamified:gamified_first_c:exp_group_c +
                         (1 | subject) + (1 | fact),
                       family = "gaussian",
                       data = d_test_rt_m,
                       prior = set_prior("normal(0, .1)", class = "b"),
                       chains = 4,
                       iter = 11000,
                       warmup = 1000,
                       sample_prior = TRUE,
                       future = TRUE,
                       seed = 0)
Compiling Stan program...
Trying to compile a simple C file
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated.
make[1]: *** [foo.o] Error 1
Start sampling

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.00026 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.6 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:     1 / 11000 [  0%]  (Warmup)
Chain 1: Iteration:  1001 / 11000 [  9%]  (Sampling)
Chain 1: Iteration:  2100 / 11000 [ 19%]  (Sampling)
Chain 1: Iteration:  3200 / 11000 [ 29%]  (Sampling)
Chain 1: Iteration:  4300 / 11000 [ 39%]  (Sampling)
Chain 1: Iteration:  5400 / 11000 [ 49%]  (Sampling)
Chain 1: Iteration:  6500 / 11000 [ 59%]  (Sampling)
Chain 1: Iteration:  7600 / 11000 [ 69%]  (Sampling)
Chain 1: Iteration:  8700 / 11000 [ 79%]  (Sampling)
Chain 1: Iteration:  9800 / 11000 [ 89%]  (Sampling)
Chain 1: Iteration: 10900 / 11000 [ 99%]  (Sampling)
Chain 1: Iteration: 11000 / 11000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 5.082 seconds (Warm-up)
Chain 1:                19.351 seconds (Sampling)
Chain 1:                24.433 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000251 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 2.51 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:     1 / 11000 [  0%]  (Warmup)
Chain 2: Iteration:  1001 / 11000 [  9%]  (Sampling)
Chain 2: Iteration:  2100 / 11000 [ 19%]  (Sampling)
Chain 2: Iteration:  3200 / 11000 [ 29%]  (Sampling)
Chain 2: Iteration:  4300 / 11000 [ 39%]  (Sampling)
Chain 2: Iteration:  5400 / 11000 [ 49%]  (Sampling)
Chain 2: Iteration:  6500 / 11000 [ 59%]  (Sampling)
Chain 2: Iteration:  7600 / 11000 [ 69%]  (Sampling)
Chain 2: Iteration:  8700 / 11000 [ 79%]  (Sampling)
Chain 2: Iteration:  9800 / 11000 [ 89%]  (Sampling)
Chain 2: Iteration: 10900 / 11000 [ 99%]  (Sampling)
Chain 2: Iteration: 11000 / 11000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 5.168 seconds (Warm-up)
Chain 2:                22.738 seconds (Sampling)
Chain 2:                27.906 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.003372 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 33.72 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:     1 / 11000 [  0%]  (Warmup)
Chain 3: Iteration:  1001 / 11000 [  9%]  (Sampling)
Chain 3: Iteration:  2100 / 11000 [ 19%]  (Sampling)
Chain 3: Iteration:  3200 / 11000 [ 29%]  (Sampling)
Chain 3: Iteration:  4300 / 11000 [ 39%]  (Sampling)
Chain 3: Iteration:  5400 / 11000 [ 49%]  (Sampling)
Chain 3: Iteration:  6500 / 11000 [ 59%]  (Sampling)
Chain 3: Iteration:  7600 / 11000 [ 69%]  (Sampling)
Chain 3: Iteration:  8700 / 11000 [ 79%]  (Sampling)
Chain 3: Iteration:  9800 / 11000 [ 89%]  (Sampling)
Chain 3: Iteration: 10900 / 11000 [ 99%]  (Sampling)
Chain 3: Iteration: 11000 / 11000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 5.154 seconds (Warm-up)
Chain 3:                26.32 seconds (Sampling)
Chain 3:                31.474 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.000179 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.79 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:     1 / 11000 [  0%]  (Warmup)
Chain 4: Iteration:  1001 / 11000 [  9%]  (Sampling)
Chain 4: Iteration:  2100 / 11000 [ 19%]  (Sampling)
Chain 4: Iteration:  3200 / 11000 [ 29%]  (Sampling)
Chain 4: Iteration:  4300 / 11000 [ 39%]  (Sampling)
Chain 4: Iteration:  5400 / 11000 [ 49%]  (Sampling)
Chain 4: Iteration:  6500 / 11000 [ 59%]  (Sampling)
Chain 4: Iteration:  7600 / 11000 [ 69%]  (Sampling)
Chain 4: Iteration:  8700 / 11000 [ 79%]  (Sampling)
Chain 4: Iteration:  9800 / 11000 [ 89%]  (Sampling)
Chain 4: Iteration: 10900 / 11000 [ 99%]  (Sampling)
Chain 4: Iteration: 11000 / 11000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 5.124 seconds (Warm-up)
Chain 4:                19.236 seconds (Sampling)
Chain 4:                24.36 seconds (Total)
Chain 4: 
summary(m_test_rt_brms)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log_rt ~ gamified + gamified:exp_group_c + gamified:gamified_first_c + gamified:gamified_first_c:exp_group_c + (1 | subject) + (1 | fact) 
   Data: d_test_rt_m (Number of observations: 3673) 
  Draws: 4 chains, each with iter = 11000; warmup = 1000; thin = 1;
         total post-warmup draws = 40000

Multilevel Hyperparameters:
~fact (Number of levels: 70) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.14      0.02     0.12     0.18 1.00    10785    18637

~subject (Number of levels: 152) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.27      0.02     0.24     0.31 1.00     7496    13982

Regression Coefficients:
                                           Estimate Est.Error l-95% CI u-95% CI
Intercept                                      1.46      0.03     1.40     1.53
gamifiedTRUE                                   0.01      0.02    -0.02     0.04
gamifiedFALSE:exp_group_c                      0.01      0.04    -0.08     0.09
gamifiedTRUE:exp_group_c                       0.03      0.04    -0.05     0.12
gamifiedFALSE:gamified_first_c                 0.04      0.04    -0.05     0.12
gamifiedTRUE:gamified_first_c                  0.09      0.04     0.01     0.17
gamifiedFALSE:exp_group_c:gamified_first_c     0.09      0.06    -0.04     0.22
gamifiedTRUE:exp_group_c:gamified_first_c      0.06      0.07    -0.07     0.19
                                           Rhat Bulk_ESS Tail_ESS
Intercept                                  1.00     5821    10898
gamifiedTRUE                               1.00    44637    31503
gamifiedFALSE:exp_group_c                  1.00     7546    14751
gamifiedTRUE:exp_group_c                   1.00     7609    15045
gamifiedFALSE:gamified_first_c             1.00     6795    14173
gamifiedTRUE:gamified_first_c              1.00     6834    13035
gamifiedFALSE:exp_group_c:gamified_first_c 1.00    12457    22090
gamifiedTRUE:exp_group_c:gamified_first_c  1.00    12397    21890

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.47      0.01     0.46     0.48 1.00    45311    30732

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Inspect the posterior sample distributions of the fixed effects:

plot(m_test_rt_brms, nvariables = 8, variable = "^b_", regex = TRUE)

Bayes factors

Do a hypothesis test for all fixed-effect coefficients (both main effects and interactions) in the model being equal to zero. The column Evid.Ratio shows the Bayes Factor in favour of the null hypothesis (\(BF_{01}\)).

h_test_rt <- hypothesis(m_test_rt_brms,
                         c("gamifiedTRUE = 0",
                           "gamifiedFALSE:exp_group_c = 0",
                           "gamifiedTRUE:exp_group_c = 0",
                           "gamifiedFALSE:gamified_first_c = 0",
                           "gamifiedTRUE:gamified_first_c = 0",
                           "gamifiedFALSE:exp_group_c:gamified_first_c = 0",
                           "gamifiedTRUE:exp_group_c:gamified_first_c = 0"),
                         class = "b")


h_test_rt$hypothesis |>
  mutate(BF10 = 1 / Evid.Ratio,
         evidence_for_null = sapply(Evid.Ratio, bf_to_strength))

This hypothesis test is calculating the Savage-Dickey density ratio at zero, which is a ratio of the posterior density at zero relative to the prior density at zero (indicated by dashed vertical line). Values above 1 indicate a stronger belief that the effect is indeed zero after having observed the data.

sd_ratio_rt <- plot(h_test_rt, nvariables = 8, variable = "^b_", regex = TRUE, plot = FALSE)

sd_ratio_rt[[1]] +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey25")

Conclusion

The Bayesian model finds evidence in favour of the null hypothesis (no effect on posttest accuracy) for all but two of the regression coefficients. There is moderate evidence for an order effect on the gamified conditions, as well as weak anecdotal evidence in favour of an indirect order effect of gamification type in the control condition ((gamifiedFALSE:exp_group_c:gamified_first_c)).

Fitted values

d_model_fit <- crossing(
  gamified = TRUE, 
  exp_group_c = 0, 
  gamified_first_c = sort(unique(d_test_rt_m$gamified_first_c))
)

d_model_fit$model_fit <- predict(m_test_rt_brms,
                                 newdata = d_model_fit,
                                 re_formula = NA)[,1] |>
  exp() # Transform logRT to RT

d_model_fit
d_model_fit <- crossing(
  gamified = FALSE, 
  exp_group_c = sort(unique(d_test_rt_m$exp_group_c)), 
  gamified_first_c = sort(unique(d_test_rt_m$gamified_first_c))
)

d_model_fit$model_fit <- predict(m_test_rt_brms,
                                 newdata = d_model_fit,
                                 re_formula = NA)[,1] |>
  exp() # Transform logRT to RT

d_model_fit

Conclusions

  • Gamified feedback had no effect on response accuracy on the test.
  • Gamified feedback had no effect on response time on correct answers on the test. One exception: there was anecdotal evidence that, in the Control condition, the difference in RT between the two experimental groups (Points vs Progress bar) changed between blocks.

Combined plot

(p_test_acc | p_test_rt) +
  plot_layout(guides = "collect") +
  plot_annotation(tag_levels = "a")

ggsave(here("output", "test_performance.png"), width = 7.5, height = 3)

Save hypothesis testing output for visualisation

fwrite(h_test_acc$hypothesis, here("output", "hypothesis_tests", "h_test_acc.csv"))
fwrite(h_test_rt$hypothesis, here("output", "hypothesis_tests", "h_test_rt.csv"))

Session info

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Amsterdam
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rstan_2.32.6       StanHeaders_2.32.6 flextable_0.9.6    data.table_1.14.8 
 [5] future_1.33.0      brms_2.21.0        Rcpp_1.0.11        lmerTest_3.1-3    
 [9] lme4_1.1-34        Matrix_1.6-1.1     tidyr_1.3.0        stringr_1.5.0     
[13] patchwork_1.3.0    scales_1.3.0       ggplot2_3.5.0      dplyr_1.1.3       
[17] here_1.0.1        

loaded via a namespace (and not attached):
  [1] gridExtra_2.3           inline_0.3.19           sandwich_3.0-2         
  [4] rlang_1.1.1             magrittr_2.0.3          multcomp_1.4-25        
  [7] matrixStats_1.3.0       compiler_4.3.1          loo_2.7.0              
 [10] reshape2_1.4.4          systemfonts_1.0.4       callr_3.7.3            
 [13] vctrs_0.6.3             httpcode_0.3.0          pkgconfig_2.0.3        
 [16] crayon_1.5.2            fastmap_1.1.1           ellipsis_0.3.2         
 [19] backports_1.4.1         labeling_0.4.3          utf8_1.2.3             
 [22] promises_1.2.1          rmarkdown_2.25          ps_1.7.5               
 [25] nloptr_2.0.3            ragg_1.2.7              purrr_1.0.2            
 [28] xfun_0.40               cachem_1.0.8            jsonlite_1.8.7         
 [31] later_1.3.1             uuid_1.2-0              parallel_4.3.1         
 [34] prettyunits_1.2.0       R6_2.5.1                bslib_0.5.1            
 [37] stringi_1.7.12          parallelly_1.36.0       boot_1.3-28.1          
 [40] jquerylib_0.1.4         numDeriv_2016.8-1.1     estimability_1.4.1     
 [43] knitr_1.44              zoo_1.8-12              bayesplot_1.11.1       
 [46] httpuv_1.6.12           splines_4.3.1           tidyselect_1.2.0       
 [49] rstudioapi_0.15.0       abind_1.4-5             yaml_2.3.7             
 [52] codetools_0.2-19        curl_5.1.0              processx_3.8.2         
 [55] listenv_0.9.0           pkgbuild_1.4.2          plyr_1.8.9             
 [58] lattice_0.21-9          tibble_3.2.1            shiny_1.8.0            
 [61] withr_2.5.1             bridgesampling_1.1-2    askpass_1.2.0          
 [64] posterior_1.5.0         coda_0.19-4             evaluate_0.22          
 [67] survival_3.5-7          RcppParallel_5.1.7      zip_2.3.1              
 [70] xml2_1.3.5              pillar_1.9.0            tensorA_0.36.2.1       
 [73] checkmate_2.3.1         stats4_4.3.1            distributional_0.4.0   
 [76] generics_0.1.3          rprojroot_2.0.3         rstantools_2.4.0       
 [79] munsell_0.5.0           minqa_1.2.6             globals_0.16.2         
 [82] xtable_1.8-4            glue_1.6.2              gdtools_0.3.7          
 [85] emmeans_1.8.9           tools_4.3.1             gfonts_0.2.0           
 [88] mvtnorm_1.2-3           grid_4.3.1              QuickJSR_1.1.3         
 [91] colorspace_2.1-0        nlme_3.1-163            cli_3.6.1              
 [94] textshaping_0.3.7       officer_0.6.6           fontBitstreamVera_0.1.1
 [97] fansi_1.0.4             Brobdingnag_1.2-9       V8_4.3.3               
[100] gtable_0.3.4            sass_0.4.7              digest_0.6.33          
[103] fontquiver_0.2.1        crul_1.4.2              TH.data_1.1-2          
[106] farver_2.1.1            htmltools_0.5.6         lifecycle_1.0.3        
[109] mime_0.12               openssl_2.1.1           fontLiberation_0.1.0   
[112] MASS_7.3-60            
---
title: "Analysis: posttest" 
subtitle: "Gamified Feedback in Adaptive Retrieval Practice"
author: "Maarten van der Velde & Gesa van den Broek"
date: "Last updated: `r Sys.Date()`"
output:
  html_notebook:
    smart: no
    toc: yes
    toc_float: yes
  github_document:
    toc: yes
editor_options: 
  chunk_output_type: inline
---

```{r setup, include = FALSE}
# Remove to disable caching
knitr::opts_chunk$set(cache = TRUE)
```

# Setup

```{r}
library(here)
library(dplyr)
library(ggplot2)
library(scales)
library(patchwork)
library(stringr)
library(tidyr)
library(lme4)
library(lmerTest)
library(brms)

# Set up parallel processing for Bayesian models
library(future)
plan(multisession, workers = 4)
```

Helper functions for plots and tables:
```{r}
source(here("scripts", "00_visualisation_functions.R"))
```

Load processed data:
```{r}
d_test <- readRDS(here("data", "processed", "d_test.rds"))
```

```{r}
add_experiment_cols <- function (data) {
  data |>
    mutate(exp_order = case_when(
      gamified_first == 0 & exp_group == "score" ~ "Control—Score",
      gamified_first == 0 & exp_group == "both" ~ "Control—Both",
      gamified_first == 1 & exp_group == "score" ~ "Score—Control",
      gamified_first == 1 & exp_group == "both" ~ "Both—Control"
    )) |>
    mutate(type = ifelse(gamified, "Gamified", "Control"))
}
```

Helper function for interpreting Bayes factors:
```{r}
bf_to_strength <- function (bf) {
  
  direction <- "for"
  
  if (bf < 1) {
    bf <- 1/bf
    direction <- "against"
  }
  
  strength <- case_when(
    bf == 1 ~ "No",
    bf < 3 ~ "Anecdotal",
    bf < 10 ~ "Moderate",
    bf < 30 ~ "Strong",
    bf < 100 ~ "Very strong",
    TRUE ~ "Extreme"
  )
  
  paste0(strength, " evidence ", direction)
}
```

# Does gamification change learning outcomes on the test?

## Accuracy 

### Prepare data
```{r}
d_test_acc <- d_test |>
  group_by(subject, exp_group, block, condition, gamified, gamified_first) |>
  summarise(accuracy = mean(correct))

d_test_acc_agg <- d_test_acc |>
  group_by(block, condition, gamified, gamified_first, exp_group) |>
  summarise(acc = mean(accuracy, na.rm = T),
            acc_se = sd(accuracy, na.rm = T)/sqrt(n())) |>
  ungroup() |>
  add_experiment_cols()
```

### Visualise data

```{r}
p_test_acc <- plot_data(d_test_acc_agg, acc, acc_se, "Accuracy") +
  scale_y_continuous(limits = c(.35, .6), labels = scales::percent_format())

p_test_acc
```


### Fit model

Prepare data for modelling by mean-centering categorical predictors:
```{r}
d_test_m <- d_test |>
  mutate(exp_group_c = ifelse(exp_group == "score", 0, 1),
         exp_group_c = exp_group_c - mean(exp_group_c),
         gamified_first_c = gamified_first - mean(gamified_first))
```


```{r}
m_test_acc <- glmer(correct ~ gamified +
                       gamified:exp_group_c +
                       gamified:gamified_first_c +
                       gamified:gamified_first_c:exp_group_c +
                       (1 | subject) + (1 | fact),
                     family = "binomial",
                     data = d_test_m)

summary(m_test_acc)
print_model_table(m_test_acc)
```



### Visualise fitted model

```{r}
p_test_acc_m <- plot_model_fit(m_test_acc, d_test_m, y_lab = "Accuracy") +
  scale_y_continuous(limits = c(.35, .6), labels = scales::percent_format(accuracy = .1))

p_test_acc_m
```

### Bayesian regression model

Fit again with `brms` so that we can calculate Bayes Factors.
Because we expect any fixed effects to be at most moderate in size, we will use a weakly informative Normal(0, 1) prior for these effects.
```{r}
m_test_acc_brms <- brm(correct ~ gamified +
                         gamified:exp_group_c +
                         gamified:gamified_first_c +
                         gamified:gamified_first_c:exp_group_c +
                         (1 | subject) + (1 | fact),
                       family = "bernoulli",
                       data = d_test_m,
                       prior = set_prior("normal(0, 1)", class = "b"),
                       chains = 4,
                       iter = 11000,
                       warmup = 1000,
                       sample_prior = TRUE,
                       future = TRUE,
                       seed = 0)

summary(m_test_acc_brms)
```

Inspect the posterior sample distributions of the fixed effects:
```{r, fig.height = 12, fig.width = 8}
plot(m_test_acc_brms, nvariables = 8, variable = "^b_", regex = TRUE)
```

#### Bayes factors

Do a hypothesis test for all fixed-effect coefficients (both main effects and interactions) in the model being equal to zero.
The column `Evid.Ratio` shows the Bayes Factor in favour of the null hypothesis ($BF_{01}$).
```{r}
h_test_acc <- hypothesis(m_test_acc_brms,
                         c("gamifiedTRUE = 0",
                           "gamifiedFALSE:exp_group_c = 0",
                           "gamifiedTRUE:exp_group_c = 0",
                           "gamifiedFALSE:gamified_first_c = 0",
                           "gamifiedTRUE:gamified_first_c = 0",
                           "gamifiedFALSE:exp_group_c:gamified_first_c = 0",
                           "gamifiedTRUE:exp_group_c:gamified_first_c = 0"),
                         class = "b")

h_test_acc$hypothesis |>
  mutate(BF10 = 1 / Evid.Ratio,
         evidence_for_null = sapply(Evid.Ratio, bf_to_strength))
```

This hypothesis test is calculating the Savage-Dickey density ratio at zero, which is a ratio of the posterior density at zero relative to the prior density at zero (indicated by dashed vertical line).
Values above 1 indicate a stronger belief that the effect is indeed zero after having observed the data.  
```{r, fig.height = 12, fig.width = 6}
sd_ratio_acc <- plot(h_test_acc, nvariables = 8, variable = "^b_", regex = TRUE, plot = FALSE)

sd_ratio_acc[[1]] +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey25")
```

#### Conclusion

The Bayesian model finds anecdotal to strong evidence in favour of the null hypothesis (no effect on posttest accuracy) for each of the regression coefficients.



## Response time

Response time on correct answers only.

### Prepare data

To keep the visualisation of average response times by condition simple, we calculate the median RT per participant, and then take the mean and SD of these medians (which are themselves roughly normally distributed).
```{r}
d_test_rt <- d_test |>
  filter(correct) |>
  mutate(rt = rt / 1000) |>
  group_by(subject, exp_group, block, condition, gamified, gamified_first) |>
  summarise(rt_median = median(rt, na.rm = TRUE))

d_test_rt_agg <- d_test_rt |>
  group_by(block, condition, gamified, gamified_first, exp_group) |>
  summarise(rt_mean = mean(rt_median, na.rm = T),
            rt_se = sd(rt_median, na.rm = T)/sqrt(n())) |>
  ungroup() |>
  add_experiment_cols()
```

### Visualise data

```{r}
p_test_rt <- plot_data(d_test_rt_agg, rt_mean, rt_se, "Response time (s)") +
  scale_y_continuous(limits = c(3, 6), labels = scales::comma_format())

p_test_rt
```


### Fit model

Since RT data is not normally distributed, we fit a lognormal model to the response times. 
(See https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#gamma-glmms .)
Prepare data for modelling by mean-centering categorical predictors:
```{r}
d_test_rt_m <- d_test |>
  filter(correct) |>
  mutate(log_rt = log(rt / 1000)) |>
  mutate(exp_group_c = ifelse(exp_group == "score", 0, 1),
         exp_group_c = exp_group_c - mean(exp_group_c),
         gamified_first_c = gamified_first - mean(gamified_first))
```


```{r}
m_test_rt <- lmer(log_rt ~ gamified +
                      gamified:exp_group_c +
                      gamified:gamified_first_c +
                      gamified:gamified_first_c:exp_group_c +
                      (1 | subject) + (1 | fact),
                    data = d_test_rt_m)

summary(m_test_rt)
print_model_table(m_test_rt)
```


### Fitted values
```{r}
d_model_fit <- crossing(
  gamified = TRUE, 
  exp_group_c = 0, 
  gamified_first_c = sort(unique(d_test_rt_m$gamified_first_c))
)

d_model_fit$model_fit <- predict(m_test_rt,
                                 newdata = d_model_fit,
                                 re.form = NA, 
                                 type = "response") |>
  exp() # Transform logRT to RT

d_model_fit
```

```{r}
d_model_fit <- crossing(
  gamified = FALSE, 
  exp_group_c = sort(unique(d_test_rt_m$exp_group_c)), 
  gamified_first_c = sort(unique(d_test_rt_m$gamified_first_c))
)

d_model_fit$model_fit <- predict(m_test_rt,
                                 newdata = d_model_fit,
                                 re.form = NA, 
                                 type = "response") |>
  exp() # Transform logRT to RT

d_model_fit
```


### Visualise fitted model

```{r}
p_test_rt_m <- plot_model_fit(m_test_rt, d_test_rt_m, exp_trans = TRUE, y_lab = "Response time (s)") +
  scale_y_continuous(limits = c(3, 6), labels = scales::comma_format())

p_test_rt_m
```

### Bayesian regression model

Fit again with `brms` so that we can calculate Bayes Factors.
Because we expect any fixed effects to be at most moderate in size, we will use a weakly informative Normal(0, 1) prior for these effects.
```{r}
m_test_rt_brms <- brm(log_rt ~ gamified +
                         gamified:exp_group_c +
                         gamified:gamified_first_c +
                         gamified:gamified_first_c:exp_group_c +
                         (1 | subject) + (1 | fact),
                       family = "gaussian",
                       data = d_test_rt_m,
                       prior = set_prior("normal(0, .1)", class = "b"),
                       chains = 4,
                       iter = 11000,
                       warmup = 1000,
                       sample_prior = TRUE,
                       future = TRUE,
                       seed = 0)

summary(m_test_rt_brms)
```

Inspect the posterior sample distributions of the fixed effects:
```{r, fig.height = 12, fig.width = 8}
plot(m_test_rt_brms, nvariables = 8, variable = "^b_", regex = TRUE)
```


#### Bayes factors

Do a hypothesis test for all fixed-effect coefficients (both main effects and interactions) in the model being equal to zero.
The column `Evid.Ratio` shows the Bayes Factor in favour of the null hypothesis ($BF_{01}$).
```{r}
h_test_rt <- hypothesis(m_test_rt_brms,
                         c("gamifiedTRUE = 0",
                           "gamifiedFALSE:exp_group_c = 0",
                           "gamifiedTRUE:exp_group_c = 0",
                           "gamifiedFALSE:gamified_first_c = 0",
                           "gamifiedTRUE:gamified_first_c = 0",
                           "gamifiedFALSE:exp_group_c:gamified_first_c = 0",
                           "gamifiedTRUE:exp_group_c:gamified_first_c = 0"),
                         class = "b")


h_test_rt$hypothesis |>
  mutate(BF10 = 1 / Evid.Ratio,
         evidence_for_null = sapply(Evid.Ratio, bf_to_strength))
```

This hypothesis test is calculating the Savage-Dickey density ratio at zero, which is a ratio of the posterior density at zero relative to the prior density at zero (indicated by dashed vertical line).
Values above 1 indicate a stronger belief that the effect is indeed zero after having observed the data.  
```{r, fig.height = 12, fig.width = 6}
sd_ratio_rt <- plot(h_test_rt, nvariables = 8, variable = "^b_", regex = TRUE, plot = FALSE)

sd_ratio_rt[[1]] +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey25")
```

#### Conclusion

The Bayesian model finds evidence in favour of the null hypothesis (no effect on posttest accuracy) for all but two of the regression coefficients.
There is moderate evidence for an order effect on the gamified conditions, as well as weak anecdotal evidence in favour of an indirect order effect of gamification type in the control condition (`(gamifiedFALSE:exp_group_c:gamified_first_c)`).


### Fitted values
```{r}
d_model_fit <- crossing(
  gamified = TRUE, 
  exp_group_c = 0, 
  gamified_first_c = sort(unique(d_test_rt_m$gamified_first_c))
)

d_model_fit$model_fit <- predict(m_test_rt_brms,
                                 newdata = d_model_fit,
                                 re_formula = NA)[,1] |>
  exp() # Transform logRT to RT

d_model_fit
```




```{r}
d_model_fit <- crossing(
  gamified = FALSE, 
  exp_group_c = sort(unique(d_test_rt_m$exp_group_c)), 
  gamified_first_c = sort(unique(d_test_rt_m$gamified_first_c))
)

d_model_fit$model_fit <- predict(m_test_rt_brms,
                                 newdata = d_model_fit,
                                 re_formula = NA)[,1] |>
  exp() # Transform logRT to RT

d_model_fit
```

## Conclusions

-	Gamified feedback had no effect on response accuracy on the test.
- Gamified feedback had no effect on response time on correct answers on the test. One exception: there was anecdotal evidence that, in the Control condition, the difference in RT between the two experimental groups (Points vs Progress bar) changed between blocks.



## Combined plot

```{r}
(p_test_acc | p_test_rt) +
  plot_layout(guides = "collect") +
  plot_annotation(tag_levels = "a")

ggsave(here("output", "test_performance.png"), width = 7.5, height = 3)
```

# Save hypothesis testing output for visualisation
```{r}
fwrite(h_test_acc$hypothesis, here("output", "hypothesis_tests", "h_test_acc.csv"))
fwrite(h_test_rt$hypothesis, here("output", "hypothesis_tests", "h_test_rt.csv"))
```

# Session info
```{r}
sessionInfo()
```